Pesquisa | Portal Regional da BVS

1.

Identification of combinations of somatic mutations that predict cancer survival and immunotherapy benefit.

Gussow, Ayal B; Koonin, Eugene V; Auslander, Noam.

NAR Cancer ; 3(2): zcab017, 2021 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-34027407

RESUMO

Cancer evolves through the accumulation of somatic mutations over time. Although several methods have been developed to characterize mutational processes in cancers, these have not been specifically designed to identify mutational patterns that predict patient prognosis. Here we present CLICnet, a method that utilizes mutational data to cluster patients by survival rate. CLICnet employs Restricted Boltzmann Machines, a type of generative neural network, which allows for the capture of complex mutational patterns associated with patient survival in different cancer types. For some cancer types, clustering produced by CLICnet also predicts benefit from anti-PD1 immune checkpoint blockade therapy, whereas for other cancer types, the mutational processes associated with survival are different from those associated with the improved anti-PD1 survival benefit. Thus, CLICnet has the ability to systematically identify and catalogue combinations of mutations that predict cancer survival, unveiling intricate associations between mutations, survival, and immunotherapy benefit.

2.

Incorporating Machine Learning into Established Bioinformatics Frameworks.

Auslander, Noam; Gussow, Ayal B; Koonin, Eugene V.

Int J Mol Sci ; 22(6)2021 Mar 12.

Artigo em Inglês | MEDLINE | ID: mdl-33809353

RESUMO

The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.

Assuntos

Biologia Computacional/tendências , Bases de Dados Factuais/tendências , Aprendizado de Máquina/tendências , Biologia de Sistemas/tendências , Algoritmos , Humanos

3.

Thousands of previously unknown phages discovered in whole-community human gut metagenomes.

Benler, Sean; Yutin, Natalya; Antipov, Dmitry; Rayko, Mikhail; Shmakov, Sergey; Gussow, Ayal B; Pevzner, Pavel; Koonin, Eugene V.

Microbiome ; 9(1): 78, 2021 03 29.

Artigo em Inglês | MEDLINE | ID: mdl-33781338

RESUMO

BACKGROUND: Double-stranded DNA bacteriophages (dsDNA phages) play pivotal roles in structuring human gut microbiomes; yet, the gut virome is far from being fully characterized, and additional groups of phages, including highly abundant ones, continue to be discovered by metagenome mining. A multilevel framework for taxonomic classification of viruses was recently adopted, facilitating the classification of phages into evolutionary informative taxonomic units based on hallmark genes. Together with advanced approaches for sequence assembly and powerful methods of sequence analysis, this revised framework offers the opportunity to discover and classify unknown phage taxa in the human gut. RESULTS: A search of human gut metagenomes for circular contigs encoding phage hallmark genes resulted in the identification of 3738 apparently complete phage genomes that represent 451 putative genera. Several of these phage genera are only distantly related to previously identified phages and are likely to found new families. Two of the candidate families, "Flandersviridae" and "Quimbyviridae", include some of the most common and abundant members of the human gut virome that infect Bacteroides, Parabacteroides, and Prevotella. The third proposed family, "Gratiaviridae," consists of less abundant phages that are distantly related to the families Autographiviridae, Drexlerviridae, and Chaseviridae. Analysis of CRISPR spacers indicates that phages of all three putative families infect bacteria of the phylum Bacteroidetes. Comparative genomic analysis of the three candidate phage families revealed features without precedent in phage genomes. Some "Quimbyviridae" phages possess Diversity-Generating Retroelements (DGRs) that generate hypervariable target genes nested within defense-related genes, whereas the previously known targets of phage-encoded DGRs are structural genes. Several "Flandersviridae" phages encode enzymes of the isoprenoid pathway, a lipid biosynthesis pathway that so far has not been known to be manipulated by phages. The "Gratiaviridae" phages encode a HipA-family protein kinase and glycosyltransferase, suggesting these phages modify the host cell wall, preventing superinfection by other phages. Hundreds of phages in these three and other families are shown to encode catalases and iron-sequestering enzymes that can be predicted to enhance cellular tolerance to reactive oxygen species. CONCLUSIONS: Analysis of phage genomes identified in whole-community human gut metagenomes resulted in the delineation of at least three new candidate families of Caudovirales and revealed diverse putative mechanisms underlying phage-host interactions in the human gut. Addition of these phylogenetically classified, diverse, and distinct phages to public databases will facilitate taxonomic decomposition and functional characterization of human gut viromes. Video abstract.

Assuntos

Bacteriófagos , Microbioma Gastrointestinal , Microbiota , Bactérias/genética , Bacteriófagos/genética , Microbioma Gastrointestinal/genética , Genoma Viral/genética , Humanos , Metagenoma , Filogenia

4.

Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning.

Vitsios, Dimitrios; Dhindsa, Ryan S; Middleton, Lawrence; Gussow, Ayal B; Petrovski, Slavé.

Nat Commun ; 12(1): 1504, 2021 03 08.

Artigo em Inglês | MEDLINE | ID: mdl-33686085

RESUMO

Elucidating functionality in non-coding regions is a key challenge in human genomics. It has been shown that intolerance to variation of coding and proximal non-coding sequence is a strong predictor of human disease relevance. Here, we integrate intolerance to variation, functional genomic annotations and primary genomic sequence to build JARVIS: a comprehensive deep learning model to prioritize non-coding regions, outperforming other human lineage-specific scores. Despite being agnostic to evolutionary conservation, JARVIS performs comparably or outperforms conservation-based scores in classifying pathogenic single-nucleotide and structural variants. In constructing JARVIS, we introduce the genome-wide residual variation intolerance score (gwRVIS), applying a sliding-window approach to whole genome sequencing data from 62,784 individuals. gwRVIS distinguishes Mendelian disease genes from more tolerant CCDS regions and highlights ultra-conserved non-coding elements as the most intolerant regions in the human genome. Both JARVIS and gwRVIS capture previously inaccessible human-lineage constraint information and will enhance our understanding of the non-coding genome.

Assuntos

Aprendizado Profundo , Genoma Humano , Genômica , DNA Intergênico , Variação Genética , Humanos , Análise de Sequência de DNA , Sequenciamento Completo do Genoma

5.

Prediction of the incubation period for COVID-19 and future virus disease outbreaks.

Gussow, Ayal B; Auslander, Noam; Wolf, Yuri I; Koonin, Eugene V.

BMC Biol ; 18(1): 186, 2020 11 30.

Artigo em Inglês | MEDLINE | ID: mdl-33256718

RESUMO

BACKGROUND: A crucial factor in mitigating respiratory viral outbreaks is early determination of the duration of the incubation period and, accordingly, the required quarantine time for potentially exposed individuals. At the time of the COVID-19 pandemic, optimization of quarantine regimes becomes paramount for public health, societal well-being, and global economy. However, biological factors that determine the duration of the virus incubation period remain poorly understood. RESULTS: We demonstrate a strong positive correlation between the length of the incubation period and disease severity for a wide range of human pathogenic viruses. Using a machine learning approach, we develop a predictive model that accurately estimates, solely from several virus genome features, in particular, the number of protein-coding genes and the GC content, the incubation time ranges for diverse human pathogenic RNA viruses including SARS-CoV-2. The predictive approach described here can directly help in establishing the appropriate quarantine durations and thus facilitate controlling future outbreaks. CONCLUSIONS: The length of the incubation period in viral diseases strongly correlates with disease severity, emphasizing the biological and epidemiological importance of the incubation period. Perhaps, surprisingly, incubation times of pathogenic RNA viruses can be accurately predicted solely from generic features of virus genomes. Elucidation of the biological underpinnings of the connections between these features and disease progression can be expected to reveal key aspects of virus pathogenesis.

Assuntos

COVID-19/patologia , COVID-19/virologia , Período de Incubação de Doenças Infecciosas , SARS-CoV-2/genética , Simulação por Computador , Genoma Viral , Humanos , Modelos Biológicos , Mutação , Quarentena

6.

Seeker: alignment-free identification of bacteriophage genomes by deep learning.

Auslander, Noam; Gussow, Ayal B; Benler, Sean; Wolf, Yuri I; Koonin, Eugene V.

Nucleic Acids Res ; 48(21): e121, 2020 12 02.

Artigo em Inglês | MEDLINE | ID: mdl-33045744

RESUMO

Recent advances in metagenomic sequencing have enabled discovery of diverse, distinct microbes and viruses. Bacteriophages, the most abundant biological entity on Earth, evolve rapidly, and therefore, detection of unknown bacteriophages in sequence datasets is a challenge. Most of the existing detection methods rely on sequence similarity to known bacteriophage sequences, impeding the identification and characterization of distinct, highly divergent bacteriophage families. Here we present Seeker, a deep-learning tool for alignment-free identification of phage sequences. Seeker allows rapid detection of phages in sequence datasets and differentiation of phage sequences from bacterial ones, even when those phages exhibit little sequence similarity to established phage families. We comprehensively validate Seeker's ability to identify previously unidentified phages, and employ this method to detect unknown phages, some of which are highly divergent from the known phage families. We provide a web portal (seeker.pythonanywhere.com) and a user-friendly Python package (github.com/gussow/seeker) allowing researchers to easily apply Seeker in metagenomic studies, for the detection of diverse unknown bacteriophages.

Assuntos

Bactérias/virologia , Bacteriófagos/genética , DNA Viral/genética , Genoma Viral , Metagenoma , Software , Bactérias/genética , Bacteriófagos/classificação , Evolução Biológica , Aprendizado Profundo , Humanos , Metagenômica/métodos , Filogenia , Análise de Sequência de DNA

7.

Evolutionary and functional classification of the CARF domain superfamily, key sensors in prokaryotic antivirus defense.

Makarova, Kira S; Timinskas, Albertas; Wolf, Yuri I; Gussow, Ayal B; Siksnys, Virginijus; Venclovas, Ceslovas; Koonin, Eugene V.

Nucleic Acids Res ; 48(16): 8828-8847, 2020 09 18.

Artigo em Inglês | MEDLINE | ID: mdl-32735657

RESUMO

CRISPR-associated Rossmann Fold (CARF) and SMODS-associated and fused to various effector domains (SAVED) are key components of cyclic oligonucleotide-based antiphage signaling systems (CBASS) that sense cyclic oligonucleotides and transmit the signal to an effector inducing cell dormancy or death. Most of the CARFs are components of a CBASS built into type III CRISPR-Cas systems, where the CARF domain binds cyclic oligoA (cOA) synthesized by Cas10 polymerase-cyclase and allosterically activates the effector, typically a promiscuous ribonuclease. Additionally, this signaling pathway includes a ring nuclease, often also a CARF domain (either the sensor itself or a specialized enzyme) that cleaves cOA and mitigates dormancy or death induction. We present a comprehensive census of CARF and SAVED domains in bacteria and archaea, and their sequence- and structure-based classification. There are 10 major families of CARF domains and multiple smaller groups that differ in structural features, association with distinct effectors, and presence or absence of the ring nuclease activity. By comparative genome analysis, we predict specific functions of CARF and SAVED domains and partition the CARF domains into those with both sensor and ring nuclease functions, and sensor-only ones. Several families of ring nucleases functionally associated with sensor-only CARF domains are also predicted.

Assuntos

Archaea/genética , Proteínas Arqueais/genética , Bactérias/genética , Proteínas de Bactérias/genética , Sistemas CRISPR-Cas , Domínios Proteicos , Archaea/enzimologia , Proteínas Arqueais/química , Bactérias/enzimologia , Proteínas de Bactérias/química , Evolução Molecular

8.

Machine-learning approach expands the repertoire of anti-CRISPR protein families.

Gussow, Ayal B; Park, Allyson E; Borges, Adair L; Shmakov, Sergey A; Makarova, Kira S; Wolf, Yuri I; Bondy-Denomy, Joseph; Koonin, Eugene V.

Nat Commun ; 11(1): 3784, 2020 07 29.

Artigo em Inglês | MEDLINE | ID: mdl-32728052

RESUMO

The CRISPR-Cas are adaptive bacterial and archaeal immunity systems that have been harnessed for the development of powerful genome editing and engineering tools. In the incessant host-parasite arms race, viruses evolved multiple anti-defense mechanisms including diverse anti-CRISPR proteins (Acrs) that specifically inhibit CRISPR-Cas and therefore have enormous potential for application as modulators of genome editing tools. Most Acrs are small and highly variable proteins which makes their bioinformatic prediction a formidable task. We present a machine-learning approach for comprehensive Acr prediction. The model shows high predictive power when tested against an unseen test set and was employed to predict 2,500 candidate Acr families. Experimental validation of top candidates revealed two unknown Acrs (AcrIC9, IC10) and three other top candidates were coincidentally identified and found to possess anti-CRISPR activity. These results substantially expand the repertoire of predicted Acrs and provide a resource for experimental Acr discovery.

Assuntos

Bacteriófagos/genética , Proteína 9 Associada à CRISPR/antagonistas & inibidores , Aprendizado de Máquina , Análise de Sequência de Proteína/métodos , Proteínas Virais/genética , Archaea/genética , Archaea/virologia , Bactérias/genética , Bactérias/virologia , Proteína 9 Associada à CRISPR/genética , Sistemas CRISPR-Cas/genética , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Edição de Genes/métodos , Interações Hospedeiro-Parasita/genética , Homologia de Sequência de Aminoácidos

9.

Genomic determinants of pathogenicity in SARS-CoV-2 and other human coronaviruses.

Gussow, Ayal B; Auslander, Noam; Faure, Guilhem; Wolf, Yuri I; Zhang, Feng; Koonin, Eugene V.

Proc Natl Acad Sci U S A ; 117(26): 15193-15199, 2020 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-32522874

RESUMO

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) poses an immediate, major threat to public health across the globe. Here we report an in-depth molecular analysis to reconstruct the evolutionary origins of the enhanced pathogenicity of SARS-CoV-2 and other coronaviruses that are severe human pathogens. Using integrated comparative genomics and machine learning techniques, we identify key genomic features that differentiate SARS-CoV-2 and the viruses behind the two previous deadly coronavirus outbreaks, SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV), from less pathogenic coronaviruses. These features include enhancement of the nuclear localization signals in the nucleocapsid protein and distinct inserts in the spike glycoprotein that appear to be associated with high case fatality rate of these coronaviruses as well as the host switch from animals to humans. The identified features could be crucial contributors to coronavirus pathogenicity and possible targets for diagnostics, prognostication, and interventions.

Assuntos

Betacoronavirus/genética , Evolução Molecular , Genoma Viral , Proteínas do Nucleocapsídeo/genética , Glicoproteína da Espícula de Coronavírus/genética , Animais , Betacoronavirus/classificação , Betacoronavirus/patogenicidade , Especificidade de Hospedeiro , Humanos , Aprendizado de Máquina , Coronavírus da Síndrome Respiratória do Oriente Médio/classificação , Coronavírus da Síndrome Respiratória do Oriente Médio/genética , Coronavírus da Síndrome Respiratória do Oriente Médio/patogenicidade , Mutagênese Insercional , Sinais de Localização Nuclear/genética , Proteínas do Nucleocapsídeo/química , Filogenia , SARS-CoV-2 , Homologia de Sequência , Glicoproteína da Espícula de Coronavírus/química , Virulência/genética

10.

Genomic determinants of pathogenicity in SARS-CoV-2 and other human coronaviruses.

Gussow, Ayal B; Auslander, Noam; Faure, Guilhem; Wolf, Yuri I; Zhang, Feng; Koonin, Eugene V.

bioRxiv ; 2020 Apr 09.

Artigo em Inglês | MEDLINE | ID: mdl-32511301

RESUMO

SARS-CoV-2 poses an immediate, major threat to public health across the globe. Here we report an in-depth molecular analysis to reconstruct the evolutionary origins of the enhanced pathogenicity of SARS-CoV-2 and other coronaviruses that are severe human pathogens. Using integrated comparative genomics and machine learning techniques, we identify key genomic features that differentiate SARS-CoV-2 and the viruses behind the two previous deadly coronavirus outbreaks, SARS-CoV and MERS-CoV, from less pathogenic coronaviruses. These features include enhancement of the nuclear localization signals in the nucleocapsid protein and distinct inserts in the spike glycoprotein that appear to be associated with high case fatality rate of these coronaviruses as well as the host switch from animals to humans. The identified features could be crucial elements of coronavirus pathogenicity and possible targets for diagnostics, prognostication and interventions.

11.

Correction: Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics.

Gussow, Ayal B; Copeland, Brett R; Dhindsa, Ryan S; Wang, Quanli; Petrovski, Slavé; Majoros, William H; Allen, Andrew S; Goldstein, David B.

PLoS One ; 13(1): e0191298, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29324863

RESUMO

[This corrects the article DOI: 10.1371/journal.pone.0181604.].

12.

Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut.

Yutin, Natalya; Makarova, Kira S; Gussow, Ayal B; Krupovic, Mart; Segall, Anca; Edwards, Robert A; Koonin, Eugene V.

Nat Microbiol ; 3(1): 38-46, 2018 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-29133882

RESUMO

Metagenomic sequence analysis is rapidly becoming the primary source of virus discovery 1-3 . A substantial majority of the currently available virus genomes come from metagenomics, and some of these represent extremely abundant viruses, even if never grown in the laboratory. A particularly striking case of a virus discovered via metagenomics is crAssphage, which is by far the most abundant human-associated virus known, comprising up to 90% of sequences in the gut virome 4 . Over 80% of the predicted proteins encoded in the approximately 100 kilobase crAssphage genome showed no significant similarity to available protein sequences, precluding classification of this virus and hampering further study. Here we combine a comprehensive search of genomic and metagenomic databases with sensitive methods for protein sequence analysis to identify an expansive, diverse group of bacteriophages related to crAssphage and predict the functions of the majority of phage proteins, in particular those that comprise the structural, replication and expression modules. Most, if not all, of the crAss-like phages appear to be associated with diverse bacteria from the phylum Bacteroidetes, which includes some of the most abundant bacteria in the human gut microbiome and that are also common in various other habitats. These findings provide for experimental characterization of the most abundant but poorly understood members of the human-associated virome.

Assuntos

Bacteriófagos/classificação , Bacteriófagos/genética , Microbioma Gastrointestinal/genética , Genômica , Metagenômica , Bacteroidetes/virologia , Bases de Dados de Proteínas , Genoma Viral/genética , Humanos , Modelos Genéticos , Dados de Sequência Molecular , Filogenia , Análise de Sequência de Proteína , Proteínas Virais/química , Proteínas Virais/genética

13.

Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics.

Gussow, Ayal B; Copeland, Brett R; Dhindsa, Ryan S; Wang, Quanli; Petrovski, Slavé; Majoros, William H; Allen, Andrew S; Goldstein, David B.

PLoS One ; 12(8): e0181604, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28797091

RESUMO

There is broad agreement that genetic mutations occurring outside of the protein-coding regions play a key role in human disease. Despite this consensus, we are not yet capable of discerning which portions of non-coding sequence are important in the context of human disease. Here, we present Orion, an approach that detects regions of the non-coding genome that are depleted of variation, suggesting that the regions are intolerant of mutations and subject to purifying selection in the human lineage. We show that Orion is highly correlated with known intolerant regions as well as regions that harbor putatively pathogenic variation. This approach provides a mechanism to identify pathogenic variation in the human non-coding genome and will have immediate utility in the diagnostic interpretation of patient genomes and in large case control studies using whole-genome sequences.

Assuntos

Variação Genética , Genoma Humano , Predisposição Genética para Doença , Genética Populacional , Humanos , Modelos Genéticos , Mutação , Fases de Leitura Aberta , Seleção Genética

14.

Inhibition of microRNA 128 promotes excitability of cultured cortical neuronal networks.

McSweeney, K Melodi; Gussow, Ayal B; Bradrick, Shelton S; Dugger, Sarah A; Gelfman, Sahar; Wang, Quanli; Petrovski, Slavé; Frankel, Wayne N; Boland, Michael J; Goldstein, David B.

Genome Res ; 26(10): 1411-1416, 2016 10.

Artigo em Inglês | MEDLINE | ID: mdl-27516621

RESUMO

Cultured neuronal networks monitored with microelectrode arrays (MEAs) have been used widely to evaluate pharmaceutical compounds for potential neurotoxic effects. A newer application of MEAs has been in the development of in vitro models of neurological disease. Here, we directly evaluated the utility of MEAs to recapitulate in vivo phenotypes of mature microRNA-128 (miR-128) deficiency, which causes fatal seizures in mice. We show that inhibition of miR-128 results in significantly increased neuronal activity in cultured neuronal networks derived from primary mouse cortical neurons. These results support the utility of MEAs in developing in vitro models of neuroexcitability disorders, such as epilepsy, and further suggest that MEAs provide an effective tool for the rapid identification of microRNAs that promote seizures when dysregulated.

Assuntos

Potenciais de Ação , MicroRNAs/genética , Neurônios/fisiologia , Técnicas de Patch-Clamp/métodos , Convulsões/genética , Análise Serial de Tecidos/métodos , Animais , Células Cultivadas , Córtex Cerebral/citologia , Camundongos , Camundongos Endogâmicos C57BL , Neurônios/metabolismo , Convulsões/fisiopatologia

15.

The intolerance to functional genetic variation of protein domains predicts the localization of pathogenic mutations within genes.

Gussow, Ayal B; Petrovski, Slavé; Wang, Quanli; Allen, Andrew S; Goldstein, David B.

Genome Biol ; 17: 9, 2016 Jan 18.

Artigo em Inglês | MEDLINE | ID: mdl-26781712

RESUMO

Ranking human genes based on their tolerance to functional genetic variation can greatly facilitate patient genome interpretation. It is well established, however, that different parts of proteins can have different functions, suggesting that it will ultimately be more informative to focus attention on functionally distinct portions of genes. Here we evaluate the intolerance of genic sub-regions using two biological sub-region classifications. We show that the intolerance scores of these sub-regions significantly correlate with reported pathogenic mutations. This observation extends the utility of intolerance scores to indicating where pathogenic mutations are mostly likely to fall within genes.

Assuntos

Variação Genética , Genoma Humano , Estrutura Terciária de Proteína/genética , Éxons/genética , Humanos , Mutação , Fases de Leitura Aberta/genética

16.

The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity.

Petrovski, Slavé; Gussow, Ayal B; Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H; Allen, Andrew S; Goldstein, David B.

PLoS Genet ; 11(9): e1005492, 2015 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-26332131

RESUMO

Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene's proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene's regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen's Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease.

Assuntos

Dosagem de Genes , Variação Genética , Sequências Reguladoras de Ácido Nucleico , Variações do Número de Cópias de DNA , Haploinsuficiência , Humanos , Transtornos Mentais/genética , Mutação , Doenças do Sistema Nervoso/genética

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA